Skip to content

Integrate deployment metadata service for locking and state#4856

Open
shreyas-goenka wants to merge 15 commits intomainfrom
shreyas-goenka/deployment-metadata-service
Open

Integrate deployment metadata service for locking and state#4856
shreyas-goenka wants to merge 15 commits intomainfrom
shreyas-goenka/deployment-metadata-service

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka commented Mar 26, 2026

Summary

Integrates the Deployment Metadata Service (DMS) as an alternative backend for deployment locking and resource state management. Gated behind DATABRICKS_BUNDLE_MANAGED_STATE=true.

When enabled:

  • Locking: Uses server-side versioned locks (with heartbeats) instead of workspace filesystem lock files
  • State: Reads/writes resource state via the DMS API (ListResources / CreateOperation) instead of local state files
  • Operations: Reports each resource operation (create, update, delete) inline to the server with resource state

Key implementation details

  • DeploymentLock interface (lock.go) with two implementations: workspaceFilesystemLock (existing behavior) and metadataServiceLock (DMS)
  • resolveDeploymentID reads deployment ID from workspace resources.json, or generates a new UUID for fresh deployments (written only after CreateDeployment succeeds)
  • LoadStateFromDMS populates the state DB from ListResources instead of reading local files
  • PushResourcesState is a no-op with DMS (state is persisted per-operation to the server)
  • --plan flag and bind/unbind are not supported with DMS
  • Heartbeat goroutine keeps the lock alive during long deployments

Test plan

  • Acceptance tests under acceptance/bundle/dms/ covering: deploy with resource creation, sequential deploys with create/delete, plan + summary, deploy errors, and lock release errors
  • Unit test for planActionToOperationAction mapping
  • E2E testing against staging workspace (32/32 passing)

@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Mar 26, 2026

Commit: 342fef8

Run: 23764962616

Env ✅​pass 🙈​skip Time
✅​ aws linux 73 14 0:46
✅​ aws windows 73 14 0:32
✅​ aws-ucws linux 76 11 0:41
✅​ aws-ucws windows 76 11 0:30
✅​ azure linux 72 14 0:49
✅​ azure windows 72 14 0:39
✅​ azure-ucws linux 75 11 0:43
✅​ azure-ucws windows 75 11 0:36
✅​ gcp linux 73 14 0:42
✅​ gcp windows 73 14 0:31


// Report skip actions to the metadata service. On initial registration,
// these are recorded as INITIAL_REGISTER operations.
if action == deployplan.Skip && b.OperationReporter != nil {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the initial registration up

@@ -0,0 +1,6 @@
Local = true
Cloud = false
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service needs to roll out to prod before we enable this on cloud.

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/deployment-metadata-service branch 11 times, most recently from 4bbbe9c to 7b26260 Compare April 14, 2026 21:15
assert.True(t, ok)
assert.Equal(t, tmpdms.VersionTypeDestroy, vt)

_, ok = goalToVersionType(GoalBind)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support can be added as a followup.

@shreyas-goenka shreyas-goenka marked this pull request as ready for review April 15, 2026 00:08
@shreyas-goenka shreyas-goenka requested review from andrewnester and pietern and removed request for andrewnester and pietern April 15, 2026 00:09
@shreyas-goenka shreyas-goenka requested a review from denik April 15, 2026 00:09
@github-actions
Copy link
Copy Markdown

Approval status: pending

/acceptance/bundle/ - needs approval

Files: acceptance/bundle/dms/add-resources/databricks.yml, acceptance/bundle/dms/add-resources/out.test.toml, acceptance/bundle/dms/add-resources/output.txt, acceptance/bundle/dms/add-resources/script, acceptance/bundle/dms/add-resources/test.toml, acceptance/bundle/dms/deploy-error/databricks.yml, acceptance/bundle/dms/deploy-error/out.test.toml, acceptance/bundle/dms/deploy-error/output.txt, acceptance/bundle/dms/deploy-error/script, acceptance/bundle/dms/deploy-error/test.toml, acceptance/bundle/dms/plan-and-summary/databricks.yml, acceptance/bundle/dms/plan-and-summary/out.test.toml, acceptance/bundle/dms/plan-and-summary/output.txt, acceptance/bundle/dms/plan-and-summary/script, acceptance/bundle/dms/release-lock-error/databricks.yml, acceptance/bundle/dms/release-lock-error/out.test.toml, acceptance/bundle/dms/release-lock-error/output.txt, acceptance/bundle/dms/release-lock-error/script, acceptance/bundle/dms/release-lock-error/test.toml, acceptance/bundle/dms/sequential-deploys/databricks.yml, acceptance/bundle/dms/sequential-deploys/out.test.toml, acceptance/bundle/dms/sequential-deploys/output.txt, acceptance/bundle/dms/sequential-deploys/script, acceptance/bundle/dms/sequential-deploys/test.toml, acceptance/bundle/dms/test.toml
Suggested: @denik
Also eligible: @pietern, @andrewnester, @anton-107, @janniklasrose, @lennartkats-db

/bundle/ - needs approval

Files: bundle/bundle.go, bundle/deploy/lock/acquire.go, bundle/deploy/lock/deployment_metadata_service.go, bundle/deploy/lock/deployment_metadata_service_test.go, bundle/deploy/lock/lock.go, bundle/deploy/lock/release.go, bundle/deploy/lock/workspace_filesystem.go, bundle/deployplan/plan.go, bundle/direct/bundle_apply.go, bundle/direct/bundle_plan.go, bundle/direct/pkg.go, bundle/env/deployment_metadata.go, bundle/phases/bind.go, bundle/phases/deploy.go, bundle/phases/destroy.go, bundle/statemgmt/resources_json.go, bundle/statemgmt/state_dms.go, bundle/statemgmt/state_pull.go, bundle/statemgmt/state_push.go
Suggested: @denik
Also eligible: @pietern, @andrewnester, @anton-107, @janniklasrose, @lennartkats-db

/cmd/bundle/ - needs approval

Files: cmd/bundle/utils/process.go
Suggested: @denik
Also eligible: @pietern, @andrewnester, @anton-107, @janniklasrose, @lennartkats-db

General files (require maintainer)

Files: acceptance/bin/print_requests.py, libs/testserver/deployment_metadata.go, libs/testserver/fake_workspace.go, libs/testserver/handlers.go, libs/testserver/server.go, libs/tmpdms/api.go, libs/tmpdms/types.go
Based on git history:

  • @denik -- recent work in bundle/direct/, libs/testserver/, bundle/statemgmt/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Add server-side deployment locking and state management via the
Deployment Metadata Service (DMS), gated behind DATABRICKS_BUNDLE_MANAGED_STATE=true.

Key changes:
- DeploymentLock interface with factory (DMS or filesystem based on env)
- DMS lock: version-based locking with heartbeat, operation reporting
- State read/write via ListResources/CreateOperation with per-resource state
- withDeploymentLock helper extracts lock boilerplate from deploy/destroy
- Temporary DMS client (libs/tmpdms) mirroring future SDK-generated code
- Mock DMS server for acceptance tests
- 6 acceptance tests covering deploy, destroy, plan, summary, sequential
  deploys, and adding resources with remote state

Co-authored-by: Isaac
LoadStateFromDMS is a state-loading function, not a lock function.
Moving it to statemgmt where it belongs alongside other state
management code.

Co-authored-by: Isaac
When we just created the deployment, LastVersionID is necessarily
empty so we can start at version "1" directly.

Co-authored-by: Isaac
Print requests inline in output.txt and clear remaining requests at
the end of each script so out.requests.txt is not generated.

Also update sequential-deploys test to add/remove resources across
deploys, asserting create and delete operations are captured.

Co-authored-by: Isaac
…erations

- Print DMS requests inline in output.txt via print_requests.py
- Update sequential-deploys to test create/delete across deploys
- Add protoLogs replacement to stabilize flaky telemetry timing
- Regenerate out.requests.txt golden files

Co-authored-by: Isaac
If CreateDeployment fails, the workspace should not contain a dangling
deployment ID pointing to a non-existent server record.

Co-authored-by: Isaac
The old lock.Acquire mutator checked for fs.ErrPermission and
fs.ErrNotExist and reported possible permission denied errors.
This was lost when refactoring to the DeploymentLock interface.

Co-authored-by: Isaac
Add print_requests.py cleanup at the end of each script to clear
remaining recorded requests, preventing out.requests.txt from being
generated as a golden file. DMS requests are already printed inline
in output.txt.

Co-authored-by: Isaac
Open out.requests.txt with explicit utf-8 encoding to handle
non-ASCII characters in request bodies.

Co-authored-by: Isaac
Regenerated with Python 3.11 after fixing the UnicodeDecodeError.
The output.txt files now contain the inline DMS request assertions
without the Python traceback errors.

Co-authored-by: Isaac
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/deployment-metadata-service branch from 7339f0b to 65c2aae Compare April 15, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants